Dataset info
| Number of variables | 16 |
|---|---|
| Number of observations | 89914 |
| Missing cells | 25333 (1.8%) |
| Duplicate rows | 0 (0.0%) |
| Total size in memory | 11.0 MiB |
| Average record size in memory | 128.0 B |
Variables types
| Numeric | 4 |
|---|---|
| Categorical | 5 |
| Boolean | 3 |
| Date | 0 |
| URL | 0 |
| Text (Unique) | 0 |
| Rejected | 4 |
| Unsupported | 0 |
Warnings
chemical_name has a high cardinality: 310 distinct values | Warning |
city has a high cardinality: 379 distinct values | Warning |
county has a high cardinality: 92 distinct values | Warning |
facility_name has a high cardinality: 1521 distinct values | Warning |
region has constant value "5" | Rejected |
release_estimate_amount is highly skewed (γ1 = 45.90390638) | Skewed |
release_estimate_amount has 31631 (35.2%) zeros | Zeros |
reporting_year is highly correlated with doc_ctrl_num (ρ = 0.9999999865) | Rejected |
state has constant value "IL" | Rejected |
street_address has a high cardinality: 1632 distinct values | Warning |
total_release is highly correlated with release_estimate_amount (ρ = 1) | Rejected |
carcinogen_chem_ind
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| N | |
|---|---|
| Y |
| Value | Count | Frequency (%) | |
| N | 66737 | 74.2% | |
| Y | 23177 | 25.8% |
chem_ind_3350
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| N | |
|---|---|
| Y |
| Value | Count | Frequency (%) | |
| N | 62882 | 69.9% | |
| Y | 27032 | 30.1% |
chemical_name
Categorical
| Distinct count | 310 |
|---|---|
| Unique (%) | 0.3% |
| Missing (%) | 0.1% |
| Missing (n) | 47 |
| LEAD | 5164 |
|---|---|
| LEAD COMPOUNDS | 4258 |
| ZINC COMPOUNDS | 3852 |
| Other values (306) |
| Value | Count | Frequency (%) | |
| LEAD | 5164 | 5.7% | |
| LEAD COMPOUNDS | 4258 | 4.7% | |
| ZINC COMPOUNDS | 3852 | 4.3% | |
| TOLUENE | 3211 | 3.6% | |
| COPPER | 3133 | 3.5% | |
| NICKEL | 2959 | 3.3% | |
| MANGANESE | 2949 | 3.3% | |
| CERTAIN GLYCOL ETHERS | 2821 | 3.1% | |
| XYLENE (MIXED ISOMERS) | 2799 | 3.1% | |
| CHROMIUM | 2646 | 2.9% | |
| Other values (299) | 56075 | 62.4% |
| Max length | 69 |
|---|---|
| Mean length | 15.23538047 |
| Min length | 3 |
| Contains chars | True |
| Contains digits | True |
| Contains spaces | True |
| Contains non-words | True |
city
Categorical
| Distinct count | 379 |
|---|---|
| Unique (%) | 0.4% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| CHICAGO | 7597 |
|---|---|
| SAUGET | 3297 |
| ELK GROVE VILLAGE | 2681 |
| Other values (376) |
| Value | Count | Frequency (%) | |
| CHICAGO | 7597 | 8.4% | |
| SAUGET | 3297 | 3.7% | |
| ELK GROVE VILLAGE | 2681 | 3.0% | |
| DECATUR | 2215 | 2.5% | |
| CHANNAHON | 2205 | 2.5% | |
| GRANITE CITY | 1857 | 2.1% | |
| ROCKFORD | 1806 | 2.0% | |
| LEMONT | 1364 | 1.5% | |
| FRANKLIN PARK | 1339 | 1.5% | |
| BEDFORD PARK | 1263 | 1.4% | |
| Other values (369) | 64290 | 71.5% |
| Max length | 20 |
|---|---|
| Mean length | 8.601830638 |
| Min length | 3 |
| Contains chars | True |
| Contains digits | False |
| Contains spaces | True |
| Contains non-words | True |
clean_air_act_chem_ind
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Y | |
|---|---|
| N |
| Value | Count | Frequency (%) | |
| Y | 62907 | 70.0% | |
| N | 27007 | 30.0% |
county
Categorical
| Distinct count | 92 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| COOK | |
|---|---|
| WILL | 6549 |
| MADISON | 4969 |
| Other values (89) |
| Value | Count | Frequency (%) | |
| COOK | 26266 | 29.2% | |
| WILL | 6549 | 7.3% | |
| MADISON | 4969 | 5.5% | |
| ST CLAIR | 4523 | 5.0% | |
| DUPAGE | 3887 | 4.3% | |
| WINNEBAGO | 2828 | 3.1% | |
| LAKE | 2715 | 3.0% | |
| KANE | 2596 | 2.9% | |
| MACON | 2219 | 2.5% | |
| ROCK ISLAND | 2172 | 2.4% | |
| Other values (82) | 31190 | 34.7% |
| Max length | 11 |
|---|---|
| Mean length | 5.861523233 |
| Min length | 3 |
| Contains chars | True |
| Contains digits | False |
| Contains spaces | True |
| Contains non-words | True |
county_code
Numeric
| Distinct count | 92 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 17095.16881 |
|---|---|
| Minimum | 17001 |
| Maximum | 17203 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 17001 |
|---|---|
| 5-th percentile | 17031 |
| Q1 | 17031 |
| Median | 17091 |
| Q3 | 17157 |
| 95-th percentile | 17197 |
| Maximum | 17203 |
| Range | 202 |
| Interquartile range | 126 |
Descriptive statistics
| Standard deviation | 63.0064935 |
|---|---|
| Coef of variation | 0.00368563155 |
| Kurtosis | -1.336177609 |
| Mean | 17095.16881 |
| MAD | 55.66664327 |
| Skewness | 0.3205553567 |
| Sum | 1537095008 |
| Variance | 3969.818223 |
| Memory size | 702.5 KiB |
| Value | Count | Frequency (%) | |
| 17031 | 26266 | 29.2% | |
| 17197 | 6549 | 7.3% | |
| 17119 | 4969 | 5.5% | |
| 17163 | 4523 | 5.0% | |
| 17043 | 3887 | 4.3% | |
| 17201 | 2828 | 3.1% | |
| 17097 | 2715 | 3.0% | |
| 17089 | 2596 | 2.9% | |
| 17115 | 2219 | 2.5% | |
| 17161 | 2172 | 2.4% | |
| Other values (82) | 31190 | 34.7% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 17001 | 892 | 1.0% | |
| 17003 | 40 | < 0.1% | |
| 17005 | 87 | 0.1% | |
| 17007 | 628 | 0.7% | |
| 17011 | 117 | 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 17203 | 445 | 0.5% | |
| 17201 | 2828 | 3.1% | |
| 17199 | 633 | 0.7% | |
| 17197 | 6549 | 7.3% | |
| 17195 | 530 | 0.6% |
doc_ctrl_num
Numeric
| Distinct count | 51250 |
|---|---|
| Unique (%) | 57.0% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 1.311596979e+12 |
|---|---|
| Minimum | 1.305203139e+12 |
| Maximum | 1.318217591e+12 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 1.305203139e+12 |
|---|---|
| 5-th percentile | 1.305203856e+12 |
| Q1 | 1.308206511e+12 |
| Median | 1.311209845e+12 |
| Q3 | 1.315213936e+12 |
| 95-th percentile | 1.318216984e+12 |
| Maximum | 1.318217591e+12 |
| Range | 1.30144514e+10 |
| Interquartile range | 7007425101 |
Descriptive statistics
| Standard deviation | 4046086033 |
|---|---|
| Coef of variation | 0.003084854645 |
| Kurtosis | -1.214671563 |
| Mean | 1.311596979e+12 |
| MAD | 3512885753 |
| Skewness | 0.02850956773 |
| Sum | 1.179309308e+17 |
| Variance | 1.637081219e+19 |
| Memory size | 702.5 KiB |
| Value | Count | Frequency (%) | |
| 1.306204299e+12 | 2 | < 0.1% | |
| 1.315213608e+12 | 2 | < 0.1% | |
| 1.315213783e+12 | 2 | < 0.1% | |
| 1.315214378e+12 | 2 | < 0.1% | |
| 1.315214339e+12 | 2 | < 0.1% | |
| 1.315214333e+12 | 2 | < 0.1% | |
| 1.31521431e+12 | 2 | < 0.1% | |
| 1.315214202e+12 | 2 | < 0.1% | |
| 1.315214299e+12 | 2 | < 0.1% | |
| 1.315214278e+12 | 2 | < 0.1% | |
| Other values (51240) | 89894 | > 99.9% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 1.305203139e+12 | 1 | < 0.1% | |
| 1.30520314e+12 | 2 | < 0.1% | |
| 1.30520314e+12 | 2 | < 0.1% | |
| 1.30520314e+12 | 2 | < 0.1% | |
| 1.30520314e+12 | 1 | < 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 1.318217591e+12 | 2 | < 0.1% | |
| 1.318217591e+12 | 1 | < 0.1% | |
| 1.318217587e+12 | 1 | < 0.1% | |
| 1.318217587e+12 | 2 | < 0.1% | |
| 1.318217587e+12 | 2 | < 0.1% |
facility_name
Categorical
| Distinct count | 1521 |
|---|---|
| Unique (%) | 1.7% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| VEOLIA ES TECHNICAL SOLUTIONS LLC | 2172 |
|---|---|
| SHERWIN-WILLIAMS CO | 1277 |
| ADM DECATUR COMPLEX | 1205 |
| Other values (1518) |
| Value | Count | Frequency (%) | |
| VEOLIA ES TECHNICAL SOLUTIONS LLC | 2172 | 2.4% | |
| SHERWIN-WILLIAMS CO | 1277 | 1.4% | |
| ADM DECATUR COMPLEX | 1205 | 1.3% | |
| WOOD RIVER REFINERY | 1099 | 1.2% | |
| CITGO PETROLEUM CORP LEMONT REFINERY | 890 | 1.0% | |
| EXXONMOBIL OIL CORP JOLIET REFINERY | 873 | 1.0% | |
| 3M CO - CORDOVA | 869 | 1.0% | |
| MARATHON PETROLEUM CO LP ILLINOIS REFINING DIV | 863 | 1.0% | |
| US STEEL GRANITE CITY WORKS | 788 | 0.9% | |
| ROHM & HAAS CHEMICALS LLC | 593 | 0.7% | |
| Other values (1511) | 79285 | 88.2% |
| Max length | 62 |
|---|---|
| Mean length | 24.18462086 |
| Min length | 3 |
| Contains chars | True |
| Contains digits | True |
| Contains spaces | True |
| Contains non-words | True |
region
Constant
This variable is constant and should be ignored for analysis
| Constant value | 5 |
|---|
release_estimate_amount
Numeric
| Distinct count | 15101 |
|---|---|
| Unique (%) | 16.8% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 4919.633186 |
|---|---|
| Minimum | 0 |
| Maximum | 5680419 |
| Zeros (%) | 35.2% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| Median | 5 |
| Q3 | 262 |
| 95-th percentile | 11682.75 |
| Maximum | 5680419 |
| Range | 5680419 |
| Interquartile range | 262 |
Descriptive statistics
| Standard deviation | 57924.88773 |
|---|---|
| Coef of variation | 11.774229 |
| Kurtosis | 2892.405466 |
| Mean | 4919.633186 |
| MAD | 8539.290659 |
| Skewness | 45.90390638 |
| Sum | 442343898.3 |
| Variance | 3355292618 |
| Memory size | 702.5 KiB |
| Value | Count | Frequency (%) | |
| 0 | 31631 | 35.2% | |
| 5 | 4170 | 4.6% | |
| 1 | 2387 | 2.7% | |
| 250 | 1722 | 1.9% | |
| 2 | 1048 | 1.2% | |
| 3 | 676 | 0.8% | |
| 10 | 545 | 0.6% | |
| 4 | 527 | 0.6% | |
| 6 | 419 | 0.5% | |
| 0.1 | 408 | 0.5% | |
| Other values (15091) | 46381 | 51.6% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0 | 31631 | 35.2% | |
| 1e-07 | 1 | < 0.1% | |
| 3e-07 | 1 | < 0.1% | |
| 4e-07 | 1 | < 0.1% | |
| 6e-07 | 1 | < 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 5680419 | 1 | < 0.1% | |
| 3845400 | 1 | < 0.1% | |
| 3716500 | 1 | < 0.1% | |
| 3623400 | 1 | < 0.1% | |
| 3489000 | 1 | < 0.1% |
reporting_year
Highly correlated
This variable is highly correlated with doc_ctrl_num and should be ignored for analysis
| Correlation | 0.9999999865 |
|---|
state
Constant
This variable is constant and should be ignored for analysis
| Constant value | IL |
|---|
street_address
Categorical
| Distinct count | 1632 |
|---|---|
| Unique (%) | 1.8% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| 7 MOBILE AVE | 2172 |
|---|---|
| 4666 FARIES PKWY E | 1205 |
| 900 S CENTRAL AVE | 1099 |
| Other values (1629) |
| Value | Count | Frequency (%) | |
| 7 MOBILE AVE | 2172 | 2.4% | |
| 4666 FARIES PKWY E | 1205 | 1.3% | |
| 900 S CENTRAL AVE | 1099 | 1.2% | |
| 135TH ST & NEW AVE | 890 | 1.0% | |
| 25915 S FRONTAGE RD | 873 | 1.0% | |
| 22614 RT 84 N | 869 | 1.0% | |
| 100 MARATHON AVE | 863 | 1.0% | |
| 1951 STATE ST | 788 | 0.9% | |
| 99 E COTTAGE AVE | 502 | 0.6% | |
| 10901 BALDWIN RD | 492 | 0.5% | |
| Other values (1622) | 80161 | 89.2% |
| Max length | 36 |
|---|---|
| Mean length | 16.34456258 |
| Min length | 5 |
| Contains chars | True |
| Contains digits | True |
| Contains spaces | True |
| Contains non-words | True |
total_release
Highly correlated
This variable is highly correlated with release_estimate_amount and should be ignored for analysis
| Correlation | 1 |
|---|
zip
Numeric
| Distinct count | 487 |
|---|---|
| Unique (%) | 0.5% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 15798153.46 |
|---|---|
| Minimum | 60002 |
| Maximum | 626502999 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 60002 |
|---|---|
| 5-th percentile | 60031 |
| Q1 | 60411 |
| Median | 60827 |
| Q3 | 62002 |
| 95-th percentile | 62832 |
| Maximum | 626502999 |
| Range | 626442997 |
| Interquartile range | 1591 |
Descriptive statistics
| Standard deviation | 97272694.47 |
|---|---|
| Coef of variation | 6.15721924 |
| Kurtosis | 34.2653051 |
| Mean | 15798153.46 |
| MAD | 30671164.14 |
| Skewness | 6.021109623 |
| Sum | 1.420475171e+12 |
| Variance | 9.461977089e+15 |
| Memory size | 702.5 KiB |
| Value | Count | Frequency (%) | |
| 62201 | 3071 | 3.4% | |
| 60007 | 2670 | 3.0% | |
| 60410 | 2201 | 2.4% | |
| 62040 | 1857 | 2.1% | |
| 60439 | 1364 | 1.5% | |
| 60131 | 1301 | 1.4% | |
| 60638 | 1274 | 1.4% | |
| 60450 | 1245 | 1.4% | |
| 625265666 | 1205 | 1.3% | |
| 60901 | 1180 | 1.3% | |
| Other values (477) | 72546 | 80.7% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 60002 | 73 | 0.1% | |
| 60005 | 741 | 0.8% | |
| 60007 | 2670 | 3.0% | |
| 60008 | 184 | 0.2% | |
| 60012 | 82 | 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 626502999 | 40 | < 0.1% | |
| 625265666 | 1205 | 1.3% | |
| 622061198 | 64 | 0.1% | |
| 620600129 | 2 | < 0.1% | |
| 619201182 | 2 | < 0.1% |
First rows
| carcinogen_chem_ind | chem_ind_3350 | chemical_name | city | clean_air_act_chem_ind | county | county_code | doc_ctrl_num | facility_name | region | release_estimate_amount | reporting_year | state | street_address | total_release | zip | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | N | N | ETHYLENE GLYCOL | LEMONT | Y | COOK | 17031 | 1.305203e+12 | CCI MANUFACTURING IL CORP | 5 | 3930.00 | 2005 | IL | 15550 CANAL BANK RD | 3930.00 | 60439 |
| 1 | N | Y | CHROMIUM | CHICAGO | Y | COOK | 17031 | 1.305203e+12 | GE MATHIS CO | 5 | 0.00 | 2005 | IL | 6100 S OAK PARK AVE | 0.00 | 60638 |
| 2 | N | N | DECABROMODIPHENYL OXIDE | SOUTH HOLLAND | N | COOK | 17031 | 1.305203e+12 | ARMACELL LLC | 5 | 0.00 | 2005 | IL | 16800 S CANAL ST | 0.00 | 60473 |
| 3 | N | N | COPPER | OREGON | N | OGLE | 17141 | 1.305203e+12 | WEC CO | 5 | 0.99 | 2005 | IL | 2606 RT 2 S | 0.99 | 61061 |
| 4 | N | N | SULFURIC ACID (1994 AND AFTER "ACID AEROSOLS" ... | BLOOMINGTON | N | MCLEAN | 17113 | 1.305203e+12 | MICKEY TRUCK BODIES INC | 5 | 10.00 | 2005 | IL | 14661 OLD COLONIAL RD | 10.00 | 61704 |
| 5 | Y | Y | NICKEL | STREATOR | Y | LASALLE | 17099 | 1.305203e+12 | STREATOR DEPENDABLE | 5 | 0.00 | 2005 | IL | 410 W BROADWAY AVE | NaN | 61364 |
| 6 | Y | Y | LEAD | BUSHNELL | Y | MCDONOUGH | 17109 | 1.305203e+12 | VAUGHAN & BUSHNELL MANUFACTURING CO | 5 | 0.00 | 2005 | IL | 201 W MAIN ST | NaN | 61422 |
| 7 | N | N | SEC-BUTYL ALCOHOL | CHANNAHON | N | WILL | 17197 | 1.305203e+12 | IMTT ILLINOIS - JOLIET FACILITY | 5 | 110.00 | 2005 | IL | 24420 W DURKEE RD | 110.00 | 60410 |
| 8 | Y | Y | LEAD | WHEELING | Y | COOK | 17031 | 1.305203e+12 | ENGIS CORP | 5 | 0.75 | 2005 | IL | 105 W. HINTZ ROAD | 0.75 | 60090 |
| 9 | N | N | N-HEXANE | CAIRO | Y | ALEXANDER | 17003 | 1.305203e+12 | BUNGE NA INC | 5 | 289616.00 | 2005 | IL | 203 34TH ST | 289616.00 | 62914 |
Last rows
| carcinogen_chem_ind | chem_ind_3350 | chemical_name | city | clean_air_act_chem_ind | county | county_code | doc_ctrl_num | facility_name | region | release_estimate_amount | reporting_year | state | street_address | total_release | zip | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 89904 | N | N | 1-BROMOPROPANE | MELROSE PARK | N | COOK | 17031 | 1.318218e+12 | ENVIRO TECH INTERNATIONAL INC | 5 | 750.0 | 2018 | IL | 1800 N 25TH AVE | NaN | 60160 |
| 89905 | Y | Y | LEAD | SPRING GROVE | Y | MCHENRY | 17111 | 1.318217e+12 | SCOT FORGE CO | 5 | 0.1 | 2018 | IL | 8001 WINN RD | 0.1 | 60081 |
| 89906 | Y | Y | LEAD | LIBERTYVILLE | Y | LAKE | 17097 | 1.318217e+12 | METALEX NORTH | 5 | 0.0 | 2018 | IL | 700 LIBERTY DR | 0.0 | 60048 |
| 89907 | N | N | SILVER COMPOUNDS | ELK GROVE VILLAGE | N | COOK | 17031 | 1.318217e+12 | PERFECTION PLATING INC | 5 | 1.0 | 2018 | IL | 775 MORSE AVE | 1.0 | 60007 |
| 89908 | Y | Y | NICKEL COMPOUNDS | JOLIET | Y | WILL | 17197 | 1.318217e+12 | APEX MATERIAL TECHNOLOGIES LLC | 5 | 0.0 | 2018 | IL | 10 INDUSTRY AVE | NaN | 60435 |
| 89909 | N | Y | CHROMIUM COMPOUNDS(EXCEPT CHROMITE ORE MINED I... | DECATUR | Y | MACON | 17115 | 1.318217e+12 | ADM DECATUR COMPLEX | 5 | 1.0 | 2018 | IL | 4666 FARIES PKWY E | 1.0 | 625265666 |
| 89910 | Y | Y | LEAD | CHICAGO | Y | COOK | 17031 | 1.318218e+12 | UNIVERSAL ELECTRIC FOUNDRY INC | 5 | 0.0 | 2018 | IL | 1523 W HUBBARD ST | NaN | 60642 |
| 89911 | N | N | ZINC COMPOUNDS | DECATUR | N | MACON | 17115 | 1.318217e+12 | ADM DECATUR COMPLEX | 5 | 0.0 | 2018 | IL | 4666 FARIES PKWY E | 0.0 | 625265666 |
| 89912 | N | N | TOLUENE | DECATUR | Y | MACON | 17115 | 1.318217e+12 | ADM DECATUR COMPLEX | 5 | 3.0 | 2018 | IL | 4666 FARIES PKWY E | 3.0 | 625265666 |
| 89913 | N | N | PROPYLENE | CHANNAHON | N | WILL | 17197 | 1.318217e+12 | DIVERSIFIED CPC INTERNATIONAL INC | 5 | 921.0 | 2018 | IL | 24338 W DURKEE RD | 921.0 | 60410 |